Vertex-Context Sampling for Weighted Network Embedding
نویسندگان
چکیده
Network embedding methods have garnered increasing aention because of their eectiveness in various information retrieval tasks. e goal is to learn low-dimensional representations of vertexes in an information network and simultaneously capture and preserve the network structure. Critical to the performance of a network embedding method is how the edges/vertexes of the network is sampled for the learning process. Many existing methods adopt a uniform sampling method to reduce learning complexity, but when the network is non-uniform (i.e. a weighted network) such uniform sampling incurs information loss. e goal of this paper is to present a generalized vertex sampling framework that works seamlessly with most existing network embedding methods to support weighted instead of uniform vertex/edge sampling. For eciency, we propose a delicate sequential vertex-to-context graph data structure, such that sampling a training pair for learning takes only constant time. For scalability and memory eciency, we design the graph data structure in a way that keeps space consumption low without requiring additional space. Moreover, the proposed framework can be used to implement extensions that feature high-order proximity modeling and weighted relation modeling. Experiments conducted on three datasets, including a commercial large-scale one, verify the eectiveness and eciency of the proposedweighted network embedding methods on various tasks, including word similarity search, multi-label classication, and item recommendation.
منابع مشابه
CANE: Context-Aware Network Embedding for Relation Modeling
Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efficient low-dimensional embedding vectors. However, existing NE models aim to learn a fixed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when i...
متن کاملApproximating Betweenness Centrality
Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationally-expensive to exactly determine betweenness; currently the fastest-known algorithm by Brandes requires O(nm) time for unweighted graphs and O(nm + n log n) time for weighted graphs, where n is the number of vertices and m is the number of edges in the network. These are als...
متن کاملPhishing website detection using weighted feature line embedding
The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...
متن کاملA Novel Weighted Distance Measure for Multi-Attributed Graph
Due to exponential growth of complex data, graph structure has become increasingly important to model various entities and their interactions, with many interesting applications including, bioinformatics, social network analysis, etc. Depending on the complexity of the data, the underlying graph model can be a simple directed/undirected and/or weighted/un-weighted graph to a complex graph (aka ...
متن کاملHeterogeneous Information Network Embedding for Meta Path based Proximity
A network embedding is a representation of a large graph in a lowdimensional space, where vertices are modeled as vectors. The objective of a good embedding is to preserve the proximity (i.e., similarity) between vertices in the original graph. This way, typical search and mining methods (e.g., similarity search, kNN retrieval, classification, clustering) can be applied in the embedded space wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.00227 شماره
صفحات -
تاریخ انتشار 2017